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Abstract 

TLA"^ is a specification language based on standard sot theory and 
temporal logic that has constructs for hierarchical proofs. We describe 
how to write TLA+ proofs and check them with TLAPS, the TLA+ Proof 
System. We use Peterson's mutual exclusion algorithm as a simple exam- 
ple to describe the features of TLAPS and show how it and the Toolbox 
(an IDE for TLA+) help users to manage large, complex proofs. 

1 Introduction 

TLA+ [11] is a specification language originally designed for specifying concur- 
rent and distributed systems and their properties. Specifications and properties 
are written as formulas of TLA, a linear-time temporal logic. TLA+ is based 
on TLA and Zermelo-Fraenkel set theory with the axiom of choice; it also adds 
a module system for structuring specifications. More recently, constructs for 
writing proofs have been added to TLA+; these are derived from a hierarchi- 
cal presentation of natural-deduction proofs proposed for writing rigorous hand 
proofs [15]. 

In this paper, we present the main ideas that guided the design of the proof 
language and our experience with using the TLA"'" tools for verifying safety 
properties of TLA"^" specifications. The TLA"^" Tbolbox is an integrated devel- 
opment environment (IDE) based on Eclipse for writing TLA"'" specifications 
and running the TLA"'" tools on them, including the TLC model checker and 
TLAPS, the TLA"'" proof system [5, 20]. In particular, it provides commands to 
hide and unhide parts of a proof, allowing a user to focus on a given proof step 
and its context. It is also invaluable to be able to run the model checker on the 
same formulas that one reasons about. 

The TLA"'" proof language and TLAPS have been designed to be indepen- 
dent of any particular theorem provcr. All interaction takes place at the level of 

*This work was partially funded by INRIA-Microsoft Research Joint Centre, France. 
It was presented at Dagstuhl seminar 12271 "AI Meets Formal Software Development" (July 
2-6, 2012). A shorter version of this article appears as [7]. 
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TLA+, letting the user focus on the specification of the algorithm being devel- 
oped. We do not expect users to have precise knowledge of the inner workings 
of the back-end provers that TLAPS uses, although with experience users learn 
about the strengths and weaknesses of the different provers — for example, that 
SMT solvers excel at arithmetic. 

TLAPS has a Proof Manager (PM) that transforms a proof into individ- 
ual proof obligations that it sends to back-end provers. Currently, the main 
back-end provers are Isabelle/TLA+, an encoding of TLA+ as an object logic 
in Isabelle [23], Zenon [4], a tableau prover for classical first-order logic with 
equality, and a back-end for SMT solvers. Isabelle serves as the most trusted 
back-end prover, and when possible, we expect back-end provers to produce a 
detailed proof that is checked by Isabelle. This is currently implemented for the 
Zenon back-end, which can export its proofs as Isar scripts that Isabelle can 
certify. 

We explain how to write and check TLA+ proofs, using a tiny well-known 
example: a proof that Peterson's algorithm [19] implements mutual exclusion. 
We start by writing the algorithm in PlusCal [12], an algorithm language that 
is based on the expression language of TLA+ . The PlusCal code is translated 
to a TLA+ specification, which is what we reason about. Section 3 introduces 
the salient features of the proof language and of TLAPS with the proof of 
mutual exclusion. Liveness of Peterson's algorithm (processes eventually enter 
their critical section) can also be asserted and proved with TLA+. However, 
liveness reasoning makes full use of temporal logic, and TLAPS cannot yet 
check temporal logic proofs. We therefore discuss only mutual exclusion. 

Section 4 describes how TLA+, TLAPS, and the Toolbox scale to realistic 
examples. Their relation to other proof systems is discussed in Section 5. A 
concluding section summarizes what we have done and our plans for future work. 

2 Modeling Peterson's Algorithm In TLA+ 

Peterson's algorithm is a classic, very simple two-process mutual exclusion al- 
gorithm. We specify the algorithm in TLA+ and prove that it satisfies mutual 
exclusion, meaning that no two processes are in their critical sections at the 
same time.^ 

2.1 From PlusCal To TLA+ 

We will write Peterson's algorithm in the PlusCal algorithm language. To do so, 
we have the Toolbox create an empty TLA"*" module. We name the two processes 
and 1, and we define an operator Not so that Not{0) = 1 and Not{l) = 0: 

Not{i) = IF i = THEN 1 ELSE 

The PlusCal code for Peterson's algorithm is shown in Figure 1; it appears in a 
comment in the TLA+ module.^ The variables statement declares the variables 

^The TLA+ module containing the specification and proof is accessible at the TLAPS Web 

page [20]. 

^Thc figure shows the pretty-printed version of PlusCal code and TLA+ formulas. As 
an example of how they are typed, here is the ASCII version of the variables declaration: 
variables flag = [i \in {0, 1} |-> FALSE], turn = 0; 
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— algorithm Peterson { 

variables flag = [i e {0, 1} H> false], turn = 0; 
process {proc G {0, 1}) { 
aO: while (true) { 
al: flag[self] := true; 
a2: turn := Not{self); 
a3a: if {flag[Not{self)]) {goto a3h} else {goto cs} ; 
a3b: if {turn = Not{self j) {goto a3a} else {goto cs} ; 
cs: skip; \* critical section 
a4: fl.ag[self] := FALSE; 
} \* end while 
} \* end process 
} \* end algorithm 

Figure 1: Peterson's algorithm in PlusCal. 

and their initial values. For example, the initial value of flag is an array such that 
flag[0] = flag[l] — false. (Mathematically, an array is a function; the TLA+ 
notation [x G S i-^ e] for writing functions is similar to a lambda expression.) 
To specify a multiprocess algorithm, it is necessary to speciiy what its atomic 
actions are. In PlusCal, an atomic action consists of the execution from one 
label to the next. With this brief explanation, the reader should be able to 
figure out what the code means. 

The PlusCal translator, accessible through a Toolbox menu, generates a 
TLA+ specification from the PlusCal code of the algorithm. Figure 2 gives the 
generated TLA+ translation.^ The PlusCal compiler adds a variable pc, which 
explicitly records the control state of each process. For example, control in 
process i is at cs iS pc[i] equals the string "cs". 

The heart of the TLA+ specification consists of the initial predicate Init, 
which describes the initial state, and the next-state relation Next, which de- 
scribes how the state can change. Given the PlusCal code, the meaning of 
formula Init in the figure is straightforward. The formula Next is a predicate on 
old-state/new-state pairs. Unprimed variables refer to the old state and primed 
variables to the new state. Formula Next is the disjunction of the two formulas 
proc{0) and p'roc(l), and each proc{self) is the disjunction of seven formulas — 
one for each label in the body of the process. The formula aO{self) specifies 
the state change performed by process self executing an atomic action starting 
at label aO, and similarly for the other six labels. (If / is a function, the TLA+ 
notation [/ EXCEPT l[arg] = exp] denotes the function that is equal to / except 
that it maps arg to exp.) The reader should be able to figure out the meaning 
of the TLA+ notation and of formula Next by comparing these seven definitions 
with the corresponding PlusCal code. 

The temporal formula Spec is the complete specification. It is satisfied by a 
behavior (i.e., an w-sequence of states) iff the behavior starts in a state satisfying 
Init and each of its steps (pairs of successive states) either satisfies Next or else 

■^For clarity of presentation, we have simplified the translation slightly by "in-lining" a 
definition. The proof we develop works for the unmodified translation if we add a global 
declaration that causes the definition to be expanded throughout the proof. 
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VARIABLES flag, turn, pc 
vars = {flag, turn, pc) 

Init = A flag = [i € {0, 1} H> false] 
A turn = 

Apc= [self e {0, 1} ^ "aO"] 

aO{self) = A pc[self] = "aO" 

Ape' = [pc EXCEPT \[self] = "al"] 
A UNCHANGED {flag, tum) 

al{self) = Apc[self] = "al" 

A flag' = [flag except ! [self] = true] 
Ape' = [pc EXCEPT \[self] = "a2"] 
A turn' = turn 

a2{self) = Ape[self] = "a2" 

A turn' = Not{self ) 

Ape' = [pc EXCEPT \[self] = "a3a"] 

A flag' = flag 

a3a{self) = A pe[self] = "a3a" 
A IF flag[Not{self)] 

THEN pc' = [pc EXCEPT \ [self] = "a3b"] 

ELSE pe' = [pe EXCEPT ![5e//] = "cs"] 
A UNCHANGED {flag, turn) 

a3b{self) = A pc[self] = "a3b" 

A IF turn = Not{self) 

THEN pe' = [pe EXCEPT \ [self] = "aSa"] 

ELSE pc' = [pc EXCEPT l [self ] = "cs"] 

A UNCHANGED {flag, turn) 

es{self) = A pe[self] = "cs" 

Ape' = [pe EXCEPT \[self] = "a4"] 
A UNCHANGED {flag, tum) 

aA{self) = A pc[self] = "a4" 

A flag' = [flag EXCEPT ! [self] = false] 
Ape' = [pc EXCEPT \ [self] = "aO"] 

A turn' = turn 

proe{self) = aO{self) V al{self) V a2{self) V a2,a{self) V a3b{self) 
V cs{self) V a4(se//) 

Next = 3 self e {0, 1} : proe{self) 

Spec = Init A □ [Next] ^ars 

Figure 2: A pretty-printed version of the TLA+ translation, slightly simplified. 
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leaves the values of the three variables flag, turn, and pc unchanged.^ The □ 
is the ordinary always operator of linear-time temporal logic, and [Next]vars is 
an abbreviation for Next V unchanged vars , where unchanged vars is an 
abbreviation for vars' = vars and priming an expression means priming all the 
variables that occur in it. 

2.2 Validation Through Model Checking 

Before trying to prove that the algorithm is correct, we use TLC, the TLA+ 

model checker, to check it for errors. We first instruct the Toolbox to have TLC 
check for "execution errors".^ What are type errors in typed languages are one 
source of execution errors in TLA+ . 

The Toolbox runs TLC on a model of a TLA"*" specification. A model usually 
assigns particular values to specification constants, such as the number N of 
processes. It can also restrict the set of states explored, which is useful if 
the specification allows an infinite number of reachable states. For this trivial 
example, there are no constants to specify and only 58 reachable states. TLC 
finds no execution errors. 

Wc next check if the algorithm actually satisfies mutual exclusion. Since we 
made execution of the critical section an atomic action, mutual exclusion means 
that the two processes never both have control at label cs. Mutual exclusion 
therefore holds iff the following predicate MutualExclusion is an invariant of the 
algorithm -meaning that it is true in all reachable states: 

MutualExclusion = (pc[0] ^ "cs")V(pc[l] 7^ "cs") 

TLC reports that the algorithm indeed satisfies this invariant. Peterson's al- 
gorithm is so simple that TLC has checked that all possible executions satisfy 
mutual exclusion. For more interesting algorithms that have an infinite set of 
reachable states, TLC is no longer able to exhaustively verify all executions, and 
correctness must be proved deductively. Still, TLC is invaluable for catching 
errors in the algorithm or its formal model: the effort required for running TLC 
is incomparably lower than that for writing a formal proof. 

3 Proving Mutual Exclusion For Peterson's Al- 
gorithm 

We now describe a deductive correctness proof of Peterson's algorithm in TLA+ . 
Proofs of more interesting algorithms follow the same basic structure, but they 
are longer. Section 4 describes how TLA+ proofs scale to larger algorithms. 

^ "Stuttering steps" that leave all variables unchanged are allowed in order to make refine- 
ment simple [10]. 

^The translation is a temporal logic formula, so there is no obvious definition of an execution 
error. An execution error occurs in a behavior if whether or not the behavior satisfies the 
formula is not specified by the semantics of TLA+ — for example, because the semantics do 
not specify whether or not equals false. 
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THEOREM Spec ^ UMutualExclusion 
(1)1. Init Inv 
(1)2. Inv A [Nexi\yars ^ Inv' 
(1)3. Inv => MutualExclusion 

(1)4. QED 

Figure 3: The high-level proof. 
3.1 The High-Level Proof 

The assertion that Peterson's algorithm implements mutual exclusion is formal- 
ized in TLA+ as: 

THEOREM Spec O MutualExclusiou 

The standard method of proving this invariance property is to find an inductive 
invariant Inv that implies MutualExclusion. An inductive invariant is one that 
is true in the initial state and whose truth is preserved by the next-state relation. 
TLA"^ proofs are hierarchically structured and are generally written top-down. 
The top level of this invariance proof is shown in Figure 3. Step (1)2 asserts 
that the truth of Inv is preserved by the next-state relation. 

Each proof in the hierarchy ends with a QED step that asserts the goal of 
that proof, the qed step for the top level asserting the statement of the theorem. 
We usually write the QED step's proof first. This QED step follows easily from 
(1)1, (1)2, and (1)3 by propositional logic and the following two temporal-logic 
proof rules: 

iA[N]^=^r p=>Q 

IhO[N]^^Ul OP^OQ 

However, TLAPS does not yet handle temporal reasoning, so we omit the proof 
of the QED step. When temporal reasoning is added to TLAPS, we expect it 
easily to check such a trivial proof. 

To continue the proof, we must define the inductive invariant Inv. (A defini- 
tion must precede its use, so the definition of Inv appears in the module before 
the proof.) Figure 4 defines Inv to be the conjunction of two formulas. The 
first, TypeOK, is a "type-correctness" invariant, asserting that the values of all 
variables are elements of the expected sets. (The expression [S — )• T] is the 
set of all functions whose domain is S and whosc^ range is a subset of T.) In 
an untyped logic like that of TLA+ , almost any inductive invariant must assert 

TypeOK = A pc e [ {0, 1} { "aO" , "al" , "a2" , "a3a" , "a3b" , "cs" , "a4" } ] 

A turn e {0, 1} 

A flag e [ {0, 1} -)> BOOLEAN ] 

7 4 Vi e {0,1} : 

A pc[i] e {"a2", "aSa", "a3b", "cs" , "a4"} =^ flag[i] 
A pc[{\ e {"cs", "a4"} A pc[Not{i)] ^ {"cs", "a4"} 

A pc[Not{i)] S {"aSa", "aSb"} turn = i 

Inv = TypeOK A I 

Figure 4: The inductive invariant. 
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type correctness. The second conjunct, /, is the interesting one that explains 
why Peterson's algorithm implements mutual exclusion. 

There is no point trying to prove that a formula is an inductive invariant 
if TLC can show that it's not even an invariant. So, we first run TLC to test 
if Inv is an invariant. In the simple case of Peterson's algorithm, TLC can 
check not only that it is an invariant, but that it is an inductive invariant. 
We check that Inv is an indiictivc invariant of Spec by checking that it is an 
(ordinary) invariant of the specification Inv A 0[Next]yars , obtained from Spec 
by replacing the initial condition by Inv. In most real examples, TLC can at 
best check an inductive invariant on a tiny model one that is too small to gain 
any confidence that it really is an inductive invariant. However, TLC can still 
often find simple errors in an inductive invariant. 

3.2 Leaf Proofs for Steps (1)1-(1)3 

We now prove steps (1)1-(1)3. We can prove them in any order; let us start with 
(1) 1. We expect this step to follow easily from the definitions of Init and Inv and 
simple properties of sets and functions. TLAPS knows about sets and functions, 
but it does not expand definitions unless directed to do so. (In complex proofs, 
automatically expanding definitions often leads to formulas that are too big for 
provers to handle.) We assert that the step follows from simple math and the 
definitions of Init and Inv by writing the following leaf proof immediately after 
the step: 

BY DEF Init, Inv 

We then tell the Toolbox to run TLAPS to check this proof. It does so and 
reports that the prover failed to prove the following obligation: 

ASSUME NEW VARIABLE flag, 

NEW VARIABLE turn, 

NEW VARIABLE pc 
PROVE (/\ flag = [i Nin {0, 1} |-> FALSE] 

/\ turn = 

/\ pc = [self \in {0, 1> |-> "aO"]) 
=> TypeOK /\ I 

This obligation is exactly what TLAPS's back-end provers are trying to prove. 

They are given no other facts. In particular, the provers know nothing about 
TypeOK and /, so they obviously can't prove the obligation. We have to tell 
TLAPS also to use the definitions of TypeOK and /. We do that by making 
the obvious change to the by proof, after which TLAPS easily proves the step. 
Forgetting to expand some definitions is a common mistake, and looking at the 
formula displayed by the Toolbox usually reveals which definitions need to be 
invoked. 

Step (1)3 is proved the same way, by simply expanding the definitions of 
MutualExclusion, Inv, I, and Not. We next try the same technique on (1)2. A 
little thought shows that we have to tell TLAPS to expand all the definitions in 
the module up to and including the definition of iVea;t, except for the definition 
of Init. However, when we direct TLAPS to prove the step, it fails to do so, 
reporting a 65-line proof obligation. 
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(1)2. Inv A [Next]yars Inv' 
(2)1. SUFFICES ASSUME Inv, Next 

PROVE Inv' 
(2)2. TypeOK' 
(2)3. /' 
(2)4. QED 

Figure 5: The top-level proof of (1)2. 

TLAPS uses Zenon and Isabelle as its default back-end provers, first trying 

Zenon and then trying Isabcllc if Zenon fails to find a proof. However, TLAPS 
also includes an SMT solver back-end [17] that is capable of handling larger 
"shallow" proof obligations — in particular, ones that do not contain significant 
quantifier reasoning. We instruct TLAPS to use the SMT back-end when prov- 
ing the current step by writing 

BY SMT DEF . . . 

The SMT back-end translates the proof obligation to SMT-LIB [3] , the standard 
input language for different SMT solvers, and calls an SMT solver (CVC3 by 
default) to try to prove the resulting formula. CVC3 proves step (1)2 in a few 
seconds. Variants of the SMT back-end translate to the native input languages 
of Yices and Z3, which sometimes perform better than does CVC3 using the 
standard SMT-LIB translation. 

3.3 A Hierarchical Proof of Step (1)2 

For sufficiently complicated examples, an SMT solver will not be able to prove 
inductive invariance as a single obligation. The proof will have to be hierarchi- 
cally decomposed. To illustrate how this is done, we now write a proof of (1)2 
that can be checked using only the Zenon and Isabelle back-end provers. 

Step (1)2 and its top-level proof appear in Figure 5. The first step in the 
proof of an implication like this would normally be: 

(2)1. SUFFICES ASSUME Inv, [Nexi\yars 
PROVE Inv' 

This step asserts that to prove the current goal, which is step (1)2, it suffices to 
assume that Inv and [Next]yars are true and prove Inv'. The stop also changes 
the goal of the rest of the levcl-2 proof to Inv' and allows the assumptions 
Inv and [Next]vars to be used in the rest of the proof. This step's assertion 
is obviously true, and TLAPS will check the one-word leaf proof obvious. 
However, the proof of Figure 5 does something a little different. 

Since the assumption [Next]yars equals Next V unchanged vars , it leaves 
two cases to be proved: (i) Next is true and (ii) all variables are unchanged, 
so their primed values equal their unprimed values. The proof in the second 
case is trivial, and TLAPS should have no trouble checking it. In Figure 5, the 
assumption in the SUFFICES statement is Next rather than [Next]vars, so the 
remainder of the proof only has to consider case (i) . To show that it suffices to 
prove Inv' under this stronger assumption, the proof of that SUFFICES step has 
to prove case (ii). 
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The remainder of the levcl-2 proof is straightforward. Since Inv equals 
TypeOK A /, the goal Inv' is the conjunction of the two formulas TypeOK' 
and /'. We therefore decompose the proof by proving each conjunct separately. 
The proof of the QED step is simply 

BY (2)2, (2)3 DEF Inv 

Observe that we have to tell TLAPS exactly what facts to use as well as what 
definitions to expand. 

We next prove (2)l-(2)3. Zenon proves (2)1 when the definitions of vars, 
Inv, TypeOK, and / are expanded. Note that the definition of Next is not 
needed. To prove (2)2 and (2)3, we need to use the definition of Next — that is, 
with all definitions expanded down to TLA+ primitives — as well as the defini- 
tion of Inv. We also have to use the assumption that Inv and Next are true, 
introduced by step (2)1. This leads us to try the following proof for (2)2. 

BY (2)1 DEFS Inv, TypeOK, Next, proc, aO, al, a2, a3a, a3b, cs, a4. Not 

Instead of the reference to step (2)1 in the BY clause, we could also name the 
required facts directly and write 

BY Inv, Next defs . . . 

The proof manager checks that Inv and Next indeed follow from the currently 
available assumptions. 

Zenon fails on this proof, but Isabelle succeeds. However, both Zenon and 
Isabclle fail on the corresponding proof of (2)3 (which requires also using the 
definition of /). To prove it (with only Zenon and Isabelle), we need one more 
level of proof. That level appears in Figure 6, which contains the complete proof 
of the theorem. 

Since priming a formula means priming all variables in it, the goal /' has 
the form G {0, 1} : exp{i)' . A standard way to prove this formula is by V- 
introduction: wc introduce a new variable, say j, we assume j G {0,1}, and 
we prove exp{jy . TLA+ provides a notation for naming subexpressions of a 
definition. With that notation, the expression exp{j) is written I\{j). This 
leads us to begin the proof of (2)3 with the SUFFICES step (3)1 of Figure 6 and 
its simple proof. 

The assumption Next (introduced by (2)1) equals 3self G {0, 1} : proc{self) . 

A standard way to use such an assumption is by 3-elimination: we pick some 
value of self such that proc{self) is true. That is what step (3)2 does, naming 
the value i. 

We simplified our task to proving P-ij)' instead of /', using proc{i) instead 
of Next, which eliminates two quantifiers. However, Zenon and Isabelle still 
cannot prove the goal in a single step. The usual way to decompose the proof 
that process i preserves an invariant is to show that each separate atomic action 
of process i preserves the invariant. In mathematical terms, proc{i) is the 
disjunction of the seven formulas aO{i), . . . , a4(i), each describing one of the 
process's atomic action. We can decompose the proof by considering each of 
the seven formulas as a separate case. 

While this is the usual procedure, Peterson's algorithm is simple enough 
that it is not necessary. Instead, we just have to help the back-end provcrs by 
splitting the proof into the two cases oi i = j and i ^ j. The reader can see 
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THEOREM Spec ^ UMutualExclusion 
(1)1. Init => Inv 

BY DEFS Init, Inv, TypeOK, I 
(1)2. Inv A [Next]yars ^ Inv' 

(2)1. SUFFICES ASSUME Inv, Next 
PROVE Inv' 
BY DEFS Inv, TypeOK, I, vars 
(2)2. TypeOK' 

BY (2)1 DEFS Inv, TypeOK, Next, proc, aO, al, a2, aSa, aSb, cs, aA, Not 
(2)3. /' 

(3)1. SUFFICES ASSUME NEW j £ {0,1} 
PROVE I\{j)' 

BY DEF / 

(3)2. PICK i e {0, 1} : proc{i) 

BY (2)1 DEF Next 
(3)3. CASE i = j 
BY (2)1, (3)2, (3)3 

DEFS Inv, I, TypeOK, proc, aO, al, a2, a3a, a36, cs, aA, Not 
(3)4. CASE i ^ i 
BY (2)1, (3)2, (3)4 

DEFS Inv, I, TypeOK, proc, aO, al, a2, a3a, a3b, cs, aA, Not 

(3)5. QED 

BY (3)3, (3)4 

(2)4. QED 

BY (2)2, (2)3 DEF Inv 
(1)3. Inv MutualExclusion 
BY T)^¥ MutualExclusion, Inv, I, Not 

(1)4. QED 

PROOF OMITTED 

Figure 6: The complete hierarchical proof. 



how this is done in Figure 6. Observe that in the proof of CASE statement (3)3, 
the name (3)3 refers to the case assumption i = j. There is no explicit use of 
(3)1 because a NEW assumption in an ASSUME is used by default in all proofs 
in the assumption's scope. The same is true of the formula i e {0, 1} asserted 
by the pick step. (This is a pragmatic choice in the design of TLAPS, based 
on the observation that such facts are used so often.) 

4 Writing Real Proofs 

We have described how one writes and checks a TLA+ proof of a tiny example. 
Several larger case studies have been carried out using the system. These include 
verifications of Byzantine Paxos [14], the Memoir security architecture [18], 
and the lookup and join protocols of the Pastry algorithm for maintaining a 
distributed hashtable over a peer-to-peer network [16]. TLA+ and TLAPS, 
with its Toolbox interface, provide a number of features that help manage the 
complexity of large proofs. 
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4.1 Hiereirchical Proofs And The Proof Manager 

The most important aid in writing large proofs is TLA+'s hierarchical and 
declarative proof language, where intermediate proof obligations are stated ex- 
plicitly. While declarative proofs are more verbose than standard tactic scripts, 
they arc; easier to understand and maintain because the information on what 
is currently being proved is available at each point. Hierarchical proofs enable 
a user to keep decomposing a complex proof into smaller steps until the steps 
become provable by one of the back-end provcrs. 

In logical terms, proof steps correspond to natural-deduction sequents whose 
validity must be established in the current context (containing constant and 
variable symbols, assumptions, and already-established facts). The Proof Man- 
ager tracks the context, which is modified by non-leaf proof steps. For leaf 
proof steps, it sends the corresponding sequent to the back-end provers, and 
records the status of the step's proof (succeeded, failed, canceled by the user, 
or omitted). 

Because proof obligations are independent of one another, users can develop 
proofs in any order and work on the proof of a step independently of the state 
of the proof of other steps. This permits them to concentrate on the part of a 
planned proof that is most likely to be wrong and require changes to other parts. 
The Toolbox makes it easy to instruct TLAPS to check the proof of everything 
in a file, of any single theorem, or of any single step. It displays every obligation 
whose proof fails or is taking too long; in the latter case the user can cancel the 
proof. Clicking on the obligation shows the part of the proof that generated it. 

A linear presentation, as in Figure 6, is unsuitable for reading or writing 
large proofs. The Toolbox's editor helps reading and writing large TLA+ proofs, 
providing commands that show or hide particular subproofs. Commands to hide 
a proof or view just its top level aid in reading a proof. A command that is 
particularly useful when writing a subproof is one that hides all preceding steps 
that cannot be used in that subproof because of their positions in the hierarchy. 

TLA+'s hierarchical proofs provide a much more powerful mechanism for 
structuring complex proofs than the conventional approach using lemmas. In a 
TLA+ proof, each step with a non-leaf proof is effectively a lemma. One typical 
1100-line invariance proof [14] contains 100 such steps. A conventional linear 
proof with 100 lemmas would be impossible to read. 

4.2 Fingerprinting: Tracking The Status Of Proof Obli- 
gations 

During proof development, a user repeatedly modifies the proof structure or 
changes details of the specification. Rerunning the back-end provers on a sizable 
proof takes time. By default, TLAPS does not rc-provc an obligation that it 
has already proved — even if the proof has been reorganized and the step that 
generated it has been moved, or if the step was removed from the proof and 
reinserted in a later version. It can also show the user the impact of a change 
by indicating which parts of the existing proof must be re-proved. 

The Proof Manager computes a fingerprint of every obligation, which it 
stores, along with the obligation's status, in a separate file. Technically, a proof 
obligation is canonically represented as a lambda term, with bound variables 
replaced by de Bruijn indices [8] such that their actual names in the TLA+ proof 
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are irrelevant. The context is minimized by erasing symbols and hypotheses 
that are not used in the step. The fingerprint is a compact representation of 
the resulting term, which is therefore insensitive to structural modifications of 
the proof context that do not affect the obligation's logical validity. 

The Toolbox displays the proof status of each step, indicating by color 
whether the step has been proved or some obligation in its proof has failed 
or has been omitted. Looking up an obligation's status takes little time, so the 
user can tell TLAPS to re-prove a step or a theorem even if only a small part 
of the proof has changed; TLAPS will recognize any obligation that has not 
changed and will not attempt to prove it anew. There is also a check-status 
command that displays the proof status without actually launching any proofs. 

An incident that occurred in the Byzantine Paxos proof reveals the advan- 
tages of our method of writing proofs. The third author wrote the safety proof 
primarily as a way of debugging TLAPS, spending a total of several weeks over 
several months on it. Later, when writing a paper about the algorithm, he 
discovered that it did not satisfy the desired livcness property, so it had to be 
modified. He changed the algorithm, fixed minor bugs found by TLC, and re- 
proved the safety property — all in a day and a half, with about 12 hours of actual 
work. He was able to do it that fast because of the hierarchical proof structure, 
TLAPS 's fingerprinting mechanism (about 3/4 of the proof obligations in the 
new proof had already been proved), and the Toolbox's aid in managing the 
proof. 

5 Related Work 

We have designed the TLA"*" proof system as a platform for interactively ver- 
ifying concurrent and distributed algorithms. Unlike most interactive proof 
assistants [24] , TLAPS has been designed around a declarative proof language 
that is independent of any specific proof back-end. TLA+ proofs indicate what 
facts are needed to prove a certain result, but they do not specify precisely how 
the back-end provers should use these facts. Although this lack of fine control 
can frustrate users who arc intimately familiar with the inner workings of a par- 
ticular prover, declarative proofs are less dependent on specific back-end provers 
and less sensitive to changes in their implementation. 

We write complex proofs by hierarchically structuring their logic. The graph- 
ical user interface provides commands that support hierarchical proofs by allow- 
ing a user to zoom in on the current context and by supporting non-linear proof 
development. Although some other interactive proof systems such as Mizar [21] 
and Isabelle/Isar [22] also oH'er hierarchical proofs, to the best of our knowledge 
these systems do not provide the Toolbox's abilities to use that structure to 
aid in reading and writing proofs and to prove individual steps in any order — 
facilities that we find crucial in developing and managing large proofs. The 
only other proof assistant that we know to offer a mechanism comparable to 
our fingerprinting facility is the KIV system [2]. 

The Rodin toolset supporting the Event-B formal method [1] shares several 
aspects with TLAPS: Event-B and TLA+ are both based on set theory, both 
emphasize refinement as a way to structure formal developments, and Rodin and 
TLAPS mechanize proofs of safety properties with the help of different back-end 
provers. Unlike with Event-B models, the structure of TLA+ specifications is 
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not fixed: any TLA+ formula can be considered as a system specification or a 
property, and TLAPS does not impose a structure on invariant or refinement 
proofs. 

Provers designed for program verification such as VCC [6] or Why [9] target 
low-level source code rather than high-level specifications of algorithms. They 
are based on generators of verification conditions corresponing to programming 
constructs, that are discharged by invoking powerful automatic provers. User 
interaction is essentially restricted to the choice of suitable program annotations. 

6 Conclusion 

Using the example of Peterson's algorithm, we have presented the main con- 
structs of the TLA+ proof language and, by extension, the ideas underlying 
the language design. That algorithm was chosen because it is well known and 
simple — so simple that we had to eschew the use of the SMT solver back-end so 
we could write a nontrivial proof. We explained in Section 4 why TLA+ proofs 
scale to more complex algorithms and specifications that we do not expect any 
prover to handle automatically. The hierarchical structure of the proof lan- 
guage is essential for giving users flexibility in designing their proof structure, 
and it ensures that individual proof steps are independent of one another. The 
fingerprinting mechanism of TLAPS makes use of this independence by stor- 
ing previously proved results and retrieving them, even when they appear in a 
different context. 

While not illustrating the entire proof language [13], Peterson's algorithm 
does show its main features. Steps correspond to natural-deduction sequents. 
Leaf proofs immediately prove a step, citing the necessary definitions, facts, and 
assumptions. Non-leaf proofs consist of another level of proof steps that end 
with a QED step. This basic structure is oriented towards forward-style proofs, 
but the judicious use of backward chaining (suffices steps) can make proofs 
more readable. Some features of the proof language that do not appear in the 
proof of Figure 6 are constructs for providing a witness to prove an existentially 
quantified formula, introducing local definitions, and specifying facts that can 
be used by the back-end provers even when they are not explicitly mentioned. 

Different proof techniques, such as resolution, tableau methods, rewriting, 
and SMT solving offer complementary strengths. Future versions of TLAPS 
will probably add new back-end provers. Adding a new back-end mainly in- 
volves writing a translation from TLA+ to the input language of the prover. 
Such translations can be complex, and there is a legitimate concern about their 
soundness as well as about the soundness of the back-ends themselves. For 
back-ends that can produce proof traces, TLAPS provides the option to certify 
the traces within Isabelle. Proof trace certification has been implemented for 
Zenon, and we plan to implement it for other back-end provers including SMT 
solvers. Still, it is much more likely that a proof is meaningless because of an 
error in the formula we are proving than because of an error in a back-end. 
Soundness also depends on parts of the proof manager. Users who do not trust 
its fingerprinting mechanism can disable it and reprove the entire proof or any 
part of it. The proof manager also carries out some critical transformations, 
such as replacing {a + b)' hy {a' + b'). 

We cannot overstate how important it is that TLAPS is integrated with 
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the other TLA+ tools - especially the TLC model checker. Checking putative 
invariants and assertions with TLC on finite instances of a specification is much 
more productive than discovering errors during the proof. Users check the exact 
same specifications that appear in their proofs. Less obvious is how useful it is 
that TLC can evaluate TLA+ formulas. When verifying a system, we don't want 
to prove well-known mathematical facts; we want to assume them. However, it 
is easy to make a mistake in formalizing even simple mathematics, and assuming 
the truth of an incorrect formula can lead to an incorrect proof. TLC can usually 
check the exact TLA+ formulas assumed in a proof for a large enough instance 
to make us confident that our formalization of a correct mathematical result is 
indeed correct. 

We are actively developing TLAPS. The current version supports reasoning 

about non-temporal formulas, which is enough for proving safety properties, 
including invariants and step simulation. Non-trivial temporal reasoning is re- 
quired for proving liveness properties, and our main short-term objective is to 
support temporal reasoning in TLAPS. It is not obvious how best to extend 
natural deduction to temporal logic. We have designed an approach involving 
two forms of sequents, expressed with two forms of the ASSUME/prove state- 
ment having different semantics, that we think will work well. We also plan to 
improve support for standard TLA+ data structures such as sequences. 
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